{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3.2. 翻译课程内容\n",
    "\n",
    "## 🚄 前言\n",
    "在上一节，我们成功实现了用大模型开发课程内容的能力。我们制作出了课程大纲和插图。接下来，通过国际化，你的课程能够更广泛地传播，进而影响更多学习者。本节课程介绍如何通过反思方法改善较低成本的文本生成模型 Qwen-Turbo的输出质量，从而将课程内容高性价比地翻译为多种语言。\n",
    "\n",
    "## 🍁 课程目标\n",
    "学完本课程后，你将能够：\n",
    "* 了解文本生成模型 Qwen-Turbo\n",
    "* 了解反思方法，并使用反思法改进 Qwen-Turbo 的输出质量\n",
    "  \n",
    "## 📖 课程目录\n",
    "\n",
    "- [1. 原理介绍](#🧮-1-原理介绍)\n",
    "- [2. 代码实践](#🛠️-2-代码实践)\n",
    "    - [2.1. 环境准备](#21-环境准备)\n",
    "    - [2.2. 设置 API 客户端](#22-设置-api-客户端)\n",
    "    - [2.3. 初步翻译](#23-初步翻译)\n",
    "    - [2.4. 反思翻译](#24-反思翻译)\n",
    "    - [2.5. 优化翻译](#25-优化翻译)\n",
    "  \n",
    "\n",
    "## 🧮 1. 原理介绍\n",
    "\n",
    "本次课程使用了文本生成模型 [Qwen-Turbo](https://bailian.console.aliyun.com/#/model-market/detail/qwen-turbo)。它是通义千问系列速度最快、成本很低的模型，适合简单任务。\n",
    "\n",
    "我们可以通过[反思方法](https://arxiv.org/abs/2303.11366)改善 Qwen-Turbo 的输出质量。该方法通过“生成-评估-优化”的过程实现持续改进提高模型的输出质量：\n",
    "\n",
    "<div align=\"center\">\n",
    "    <img src=\"https://gw.alicdn.com/imgextra/i1/O1CN01JkN0lj1F4otLyToYM_!!6000000000434-0-tps-1004-250.jpg\" alt=\"流程\" width=\"35%\"/>\n",
    "</div>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "使用反思方法提高课程内容翻译质量的过程如下：\n",
    "\n",
    "<div align=\"center\">\n",
    "    <img src=\"https://gw.alicdn.com/imgextra/i2/O1CN01NjJozB1nrXxtiKQZY_!!6000000005143-0-tps-2146-170.jpg\" alt=\"流程\" width=\"80%\"/>\n",
    "</div>\n",
    "\n",
    "\n",
    "## 🛠️ 2. 代码实践\n",
    "\n",
    "接下来，让我们运行以下代码，将上一节课生成的课程内容翻译成英式英语。\n",
    "\n",
    "### 2.1. 环境准备\n",
    "\n",
    "1. 安装 Python 库。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T11:55:44.259618Z",
     "iopub.status.busy": "2024-10-09T11:55:44.259230Z",
     "iopub.status.idle": "2024-10-09T11:55:48.585984Z",
     "shell.execute_reply": "2024-10-09T11:55:48.585363Z",
     "shell.execute_reply.started": "2024-10-09T11:55:44.259580Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "! pip install -r requirements.txt -q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 导入必要的模块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T11:55:48.587305Z",
     "iopub.status.busy": "2024-10-09T11:55:48.587027Z",
     "iopub.status.idle": "2024-10-09T11:55:48.934686Z",
     "shell.execute_reply": "2024-10-09T11:55:48.934191Z",
     "shell.execute_reply.started": "2024-10-09T11:55:48.587285Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from typing import List, Union, Tuple\n",
    "import openai\n",
    "import json\n",
    "import dashscope\n",
    "from utils import create_directory, save_file, read_text_from_file,load_config"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. 设置环境变量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "sys.path.append(\"../\")\n",
    "from config.load_key import load_key\n",
    "load_key()\n",
    "print(os.environ[\"DASHSCOPE_API_KEY\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. 加载配置文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-09T11:56:01.059356Z",
     "iopub.status.busy": "2024-10-09T11:56:01.059025Z",
     "iopub.status.idle": "2024-10-09T11:56:01.064758Z",
     "shell.execute_reply": "2024-10-09T11:56:01.064154Z",
     "shell.execute_reply.started": "2024-10-09T11:56:01.059333Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "project_config = load_config(\"config.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 2.2. 设置 API 客户端\n",
    "设置 OpenAI 的 API 客户端，用于后续调用阿里云百炼的 Qwen-Max 模型和 Flux-Merged 模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T11:56:02.885684Z",
     "iopub.status.busy": "2024-10-09T11:56:02.885348Z",
     "iopub.status.idle": "2024-10-09T11:56:02.932705Z",
     "shell.execute_reply": "2024-10-09T11:56:02.932275Z",
     "shell.execute_reply.started": "2024-10-09T11:56:02.885663Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "client = openai.OpenAI(\n",
    "    api_key=os.getenv(\"DASHSCOPE_API_KEY\"),\n",
    "    base_url=\"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 2.3. 初步翻译\n",
    "\n",
    "首先，我们让文本生成模型 Qwen-Turbo 扮演一位翻译，生成初始的译文。\n",
    "\n",
    "1. 定义一个 `create_initial_translation` 函数，用于使用 Qwen-Turbo 将课程内容从源语言直接翻译为目标语言。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T11:58:34.974606Z",
     "iopub.status.busy": "2024-10-09T11:58:34.974158Z",
     "iopub.status.idle": "2024-10-09T11:58:34.979681Z",
     "shell.execute_reply": "2024-10-09T11:58:34.979027Z",
     "shell.execute_reply.started": "2024-10-09T11:58:34.974568Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def create_initial_translation(source_lang, target_lang, source_text):\n",
    "    \"\"\"\n",
    "    将源文本从指定语言翻译成目标语言，并返回翻译结果。\n",
    "\n",
    "    参数：\n",
    "    - source_lang: str，源语言代码（如 'zh' 表示中文）。\n",
    "    - target_lang: str，目标语言代码（如 'en' 表示英语）。\n",
    "    - source_text: str，待翻译的源文本。\n",
    "\n",
    "    返回：\n",
    "    - str，翻译后的文本。\n",
    "    \"\"\"\n",
    "\n",
    "    # 创建系统消息，指示助手的角色和任务\n",
    "    system_message = f\"您是一位专业翻译，专注于将 {source_lang} 翻译成 {target_lang}。\"\n",
    "\n",
    "    # 将源文本和翻译任务包含在提示中，要求只输出翻译内容\n",
    "    prompt = f\"\"\"您的任务是将以下文本：{source_text}从{source_lang}翻译为{target_lang}。请仅输出翻译内容，无需附加其他信息。图片路径不需要翻译。输出格式为 Markdown。\"\"\"\n",
    "\n",
    "    # 通过 API 调用翻译服务，生成翻译结果\n",
    "    completion = client.chat.completions.create(\n",
    "        model=\"qwen-turbo\",\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": system_message},\n",
    "            {\"role\": \"user\", \"content\": prompt},\n",
    "        ],\n",
    "    )\n",
    "\n",
    "    # 如果 model_dump_json() 返回字符串需要解析为 JSON 格式\n",
    "    dumped_json = json.loads(completion.model_dump_json())\n",
    "\n",
    "    # 从 JSON 获取生成的翻译内容\n",
    "    generated_content = dumped_json['choices'][0]['message']['content']\n",
    "    print(\"生成初步翻译:\", generated_content)\n",
    "\n",
    "    # 返回翻译后的文本\n",
    "    return generated_content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 调用 `create_initial_translation` 函数，将上节课生成的“云计算”课程脚本从中文翻译为英文。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T11:58:39.674741Z",
     "iopub.status.busy": "2024-10-09T11:58:39.674444Z",
     "iopub.status.idle": "2024-10-09T11:58:54.089321Z",
     "shell.execute_reply": "2024-10-09T11:58:54.088725Z",
     "shell.execute_reply.started": "2024-10-09T11:58:39.674720Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 读取配置文件中的课程脚本文件路径\n",
    "course_script_with_illustrations_file_path = project_config[\"course_script_with_illustrations_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 根据课程脚本文件路径读取课程内容\n",
    "script = read_text_from_file(course_script_with_illustrations_file_path)\n",
    "\n",
    "# 读取配置文件中的源语言、目标语言信息\n",
    "source_lang = project_config[\"source_lang\"]\n",
    "target_lang = project_config[\"target_lang\"]\n",
    "\n",
    "# 调用函数进行初步翻译\n",
    "initial_translation = create_initial_translation(source_lang, target_lang, script)\n",
    "\n",
    "# 读取配置文件中的初步翻译文件路径\n",
    "initial_translation_file_path = project_config[\"initial_translation_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 保存初步翻译结果\n",
    "save_file(initial_translation, initial_translation_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 2.4. 反思翻译\n",
    "\n",
    "接下来，我们让 Qwen-Turbo 模型扮演语言专家，反思初始译文并提供改进建议。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 定义一个 `reflect_on_translation ` 函数，用于对译文进行反思。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecutionIndicator": {
     "show": false
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T11:58:59.413577Z",
     "iopub.status.busy": "2024-10-09T11:58:59.413234Z",
     "iopub.status.idle": "2024-10-09T11:58:59.418441Z",
     "shell.execute_reply": "2024-10-09T11:58:59.417900Z",
     "shell.execute_reply.started": "2024-10-09T11:58:59.413554Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def reflect_on_translation(source_lang, target_lang, source_text, translation, country):\n",
    "    system_message = (\n",
    "        f\"您是一位语言专家，专注于从 {source_lang} 翻译到 {target_lang}。\"\n",
    "        f\"您将获得源文本及其翻译，您的目标是改进翻译内容。\"\n",
    "    )\n",
    "\n",
    "    prompt = f\"\"\"\n",
    "    您的任务是仔细阅读从 {source_lang} 翻译到 {target_lang} 的源文本和翻译，然后提供建设性的批评和有用的建议，以改善翻译。\n",
    "\n",
    "    翻译的最终风格和语气应符合在 {country} 口语中使用的 {target_lang} 的风格。\n",
    "\n",
    "    源文本和翻译如下：\n",
    "    <SOURCE_TEXT>\n",
    "    {source_text}\n",
    "    </SOURCE_TEXT>\n",
    "    <TRANSLATION>\n",
    "    {translation}\n",
    "    </TRANSLATION>\n",
    "\n",
    "    在写建议时，请注意是否有改善翻译的方式：\n",
    "    (i) 准确性（通过纠正添加、误译、遗漏或未翻译文本的错误），\n",
    "    (ii) 流畅性（通过应用 {target_lang} 的语法、拼写和标点规则，确保没有不必要的重复），\n",
    "    (iii) 风格（确保翻译反映源文本的风格，并考虑任何文化背景），\n",
    "    (iv) 术语（确保术语使用的一致性，反映源文本的领域；并仅使用与 {target_lang} 等价的习语）。\n",
    "\n",
    "    写出一系列具体、有帮助和建设性的建议，以改进翻译。\n",
    "    每个建议应针对翻译的一个具体部分。\n",
    "    只输出建议，不加入任何额外的解释内容。\n",
    "    输出格式为 Markdown。\n",
    "    \"\"\"\n",
    "\n",
    "    # 直接调用 API 获取课程脚本\n",
    "    completion = client.chat.completions.create(\n",
    "        model=\"qwen-turbo\",\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": system_message},\n",
    "            {\"role\": \"user\", \"content\": prompt},\n",
    "        ],\n",
    "    )\n",
    "\n",
    "    # 如果 model_dump_json() 返回字符串需要解析为 JSON\n",
    "    dumped_json = json.loads(completion.model_dump_json())\n",
    "\n",
    "    generated_content = dumped_json['choices'][0]['message']['content']\n",
    "    print(\"生成翻译反思:\", generated_content)\n",
    "\n",
    "    # 返回翻译反思\n",
    "    return generated_content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 调用 `reflect_on_translation` 函数生成改进建议。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T11:59:14.357082Z",
     "iopub.status.busy": "2024-10-09T11:59:14.356733Z",
     "iopub.status.idle": "2024-10-09T11:59:22.406378Z",
     "shell.execute_reply": "2024-10-09T11:59:22.405891Z",
     "shell.execute_reply.started": "2024-10-09T11:59:14.357060Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 读取配置文件中的国家信息\n",
    "country = project_config[\"country\"]\n",
    "\n",
    "# 调用函数进行反思\n",
    "reflection_on_translation = reflect_on_translation(source_lang, target_lang, script, initial_translation, country)\n",
    "\n",
    "# 读取配置文件中的翻译反思文件路径\n",
    "reflection_on_translation_file_path = project_config[\"reflection_on_translation_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 保存翻译反思结果\n",
    "save_file(reflection_on_translation, reflection_on_translation_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 2.5. 优化翻译"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，我们让 Qwen-Turbo 模型扮演译审专家，进行译后编辑。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 定义一个 `improve_illustration` 函数，用于优化翻译。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T12:00:10.328500Z",
     "iopub.status.busy": "2024-10-09T12:00:10.328000Z",
     "iopub.status.idle": "2024-10-09T12:00:10.334634Z",
     "shell.execute_reply": "2024-10-09T12:00:10.334091Z",
     "shell.execute_reply.started": "2024-10-09T12:00:10.328470Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def improve_translation(source_lang, target_lang, source_text, initial_translation, reflection):\n",
    "    \"\"\"\n",
    "    改进从源语言翻译到目标语言的文本翻译。\n",
    "\n",
    "    此函数通过使用初始翻译和改进建议，生成更准确和流畅的翻译文本。\n",
    "    函数将通过 API 调用与翻译模型进行交互，利用系统消息、源文本、初始翻译和反思建议生成最终翻译。\n",
    "\n",
    "    参数:\n",
    "    source_lang (str): 源语言的名称或代码，例如 '中文' 或 'en'。\n",
    "    target_lang (str): 目标语言的名称或代码，例如 '英文' 或 'fr'。\n",
    "    source_text (str): 需要翻译的源文本内容。\n",
    "    initial_translation (str): 初始翻译文本，用于比较和改进。\n",
    "    reflection (str): 专家的改进建议和建设性批评，供文本编辑参考。\n",
    "\n",
    "    返回:\n",
    "    str: 改进后的翻译文本，仅包含新的翻译内容。\n",
    "\n",
    "    注意:\n",
    "    函数只输出最终翻译结果，并不附加其他内容。输出格式为 Markdown。\n",
    "    \"\"\"\n",
    "\n",
    "    # 定义系统消息，描述编辑任务\n",
    "    system_message = (\n",
    "        f\"您是一位高级编辑，专注于从 {source_lang} 翻译到 {target_lang} 的后期编辑。\"\n",
    "        f\"您将获得源文本、初始翻译和建议，您的目标是改进翻译内容。\"\n",
    "    )\n",
    "\n",
    "    # 构建用于请求的提示信息\n",
    "    prompt = f\"\"\"\n",
    "    您的任务是仔细阅读并编辑从 {source_lang} 翻译到 {target_lang} 的翻译，\n",
    "    同时考虑专家建议和建设性批评的列表。\n",
    "    源文本：\n",
    "    <SOURCE_TEXT>\n",
    "    {source_text}  # 源文本内容\n",
    "    </SOURCE_TEXT>\n",
    "    初始翻译：\n",
    "    <INITIAL_TRANSLATION>\n",
    "    {initial_translation}  # 初始翻译内容\n",
    "    </INITIAL_TRANSLATION>\n",
    "    改进建议：\n",
    "    <SUGGESTIONS>\n",
    "    {reflection}  # 提供的反思或改善建议\n",
    "    </SUGGESTIONS>\n",
    "\n",
    "    请在编辑翻译时考虑专家建议。编辑翻译时请确保：\n",
    "    (i) 准确性（纠正添加、误翻、遗漏或未翻译文本的错误），\n",
    "    (ii) 流畅性（应用 {target_lang} 的语法、拼写和标点规则，确保没有不必要的重复），\n",
    "    (iii) 风格（确保翻译反映源文本的风格），\n",
    "    (iv) 术语（不适合上下文或使用不一致），\n",
    "    (v) 其他错误。\n",
    "    只输出新的翻译，不附加其他内容。\n",
    "    输出格式为Markdown。\n",
    "    \"\"\"\n",
    "\n",
    "    # 直接调用 API 获取课程脚本\n",
    "    completion = client.chat.completions.create(\n",
    "        model=\"qwen-turbo\",  # 使用的模型\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": system_message},  # 系统消息\n",
    "            {\"role\": \"user\", \"content\": prompt},  # 用户提示\n",
    "        ],\n",
    "    )\n",
    "\n",
    "    # 解析从 API 返回的 JSON 数据\n",
    "    dumped_json = json.loads(completion.model_dump_json())\n",
    "    # 提取生成的内容\n",
    "    generated_content = dumped_json['choices'][0]['message']['content']\n",
    "\n",
    "    # 打印生成的优化后的翻译\n",
    "    print(\"生成优化后的翻译:\", generated_content)\n",
    "\n",
    "    # 返回优化后的翻译内容\n",
    "    return generated_content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 调用 `improve_illustration` 函数优化翻译。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-09T12:00:12.335116Z",
     "iopub.status.busy": "2024-10-09T12:00:12.334756Z",
     "iopub.status.idle": "2024-10-09T12:00:26.515226Z",
     "shell.execute_reply": "2024-10-09T12:00:26.514728Z",
     "shell.execute_reply.started": "2024-10-09T12:00:12.335091Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 调用函数进行反思\n",
    "improved_translation = improve_translation(source_lang, target_lang, script, initial_translation, reflection_on_translation)\n",
    "\n",
    "# 读取配置文件中的优化翻译文件路径\n",
    "improved_translation_file_path = project_config[\"improved_translation_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 保存优化后的翻译结果\n",
    "save_file(improved_translation, improved_translation_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## ✅ 本节小结\n",
    "\n",
    "- 在本次学习和实践中，我们了解了低成本的 Qwen-Turbo，并通过反思方法改进了 Qwen-Turbo 的输出质量。\n",
    "- 除了国际化的需求外，将课程内容转换为 PPT 进行授课也是非常常见的需求。接下来，我们将学习如何使用大模型结合工具将课程内容转换为 PPT。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ✉️ 评价反馈\n",
    "感谢你学习阿里云大模型ACP认证课程，如果你觉得课程有哪里写得好、哪里写得不好，期待你[通过问卷提交评价和反馈](https://survey.aliyun.com/apps/zhiliao/Mo5O9vuie)。\n",
    "\n",
    "你的批评和鼓励都是我们前进的动力。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
